Distributional Reinforcement Learning via Moment Matching

نویسندگان

چکیده

We consider the problem of learning a set probability distributions from empirical Bellman dynamics in distributional reinforcement (RL), class state-of-the-art methods that estimate distribution, as opposed to only expectation, total return. formulate method learns finite statistics each return distribution via neural networks, RL literature. Existing however constrain learned predefined functional forms which is both restrictive representation and difficult maintaining statistics. Instead, we learn unrestricted statistics, i.e., deterministic (pseudo-)samples, by leveraging technique hypothesis testing known maximum mean discrepancy (MMD), leads simpler objective amenable backpropagation. Our can be interpreted implicitly matching all orders moments between its target. establish sufficient conditions for contraction operator provide finite-sample analysis samples approximation. Experiments on suite Atari games show our outperforms standard baselines sets new record non-distributed agents.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Distributional Perspective on Reinforcement Learning

In this paper we argue for the fundamental importance of the value distribution: the distribution of the random return received by a reinforcement learning agent. This is in contrast to the common approach to reinforcement learning which models the expectation of this return, or value. Although there is an established body of literature studying the value distribution, thus far it has always be...

متن کامل

A Distributional Perspective on Reinforcement Learning

To the best of our knowledge, the work closest to ours are two papers (Morimura et al., 2010b;a) studying the distributional Bellman equation from the perspective of its cumulative distribution functions. The authors propose both parametric and nonparametric solutions to learn distributions for risk-sensitive reinforcement learning. They also provide some theoretical analysis for the policy eva...

متن کامل

Distributional Reinforcement Learning with Quantile Regression

In reinforcement learning an agent interacts with the environment by taking actions and observing the next state and reward. When sampled probabilistically, these state transitions, rewards, and actions can all induce randomness in the observed long-term return. Traditionally, reinforcement learning algorithms average over this randomness to estimate the value function. In this paper, we build ...

متن کامل

Reinforcement Learning by Probability Matching

We present a new algorithm for associative reinforcement learning. The algorithm is based upon the idea of matching a network's output probability with a probability distribution derived from the environment's reward signal. This Probability Matching algorithm is shown to perform faster and be less susceptible to local minima than previously existing algorithms. We use Probability Matching to t...

متن کامل

RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning

Deep reinforcement learning (deep RL) has been successful in learning sophisticated behaviors automatically; however, the learning process requires a huge number of trials. In contrast, animals can learn new tasks in just a few trials, benefiting from their prior knowledge about the world. This paper seeks to bridge this gap. Rather than designing a “fast” reinforcement learning algorithm, we p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2021

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v35i10.17104